Video sequence processing

ABSTRACT

An array of pixel-to-pixel dissimilarity values is analysed to select a pixel which has a low pixel-to-pixel dissimilarity value and which has neighbouring pixels which have a low pixel-to-pixel dissimilarity value. Each pixel-to-pixel dissimilarity value represents a difference between the value of a pixel in a first spatial pixel array representing a first image and the value of a corresponding pixel in a second spatial pixel array representing a second image. The pixels neighbouring the current pixel are divided into at least two polar sectors. The current pixel is selected when the current pixel and the pixels of at least one polar sector have low dissimilarity values.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a continuation application of U.S. patentapplication Ser. No. 13/832,764 filed on Mar. 15, 2013, titled “Methodand Apparatus for Analysing an Array of Pixel-to-pixel DissimilarityValues by Combining Outputs of Partial Filters in a Non-linearOperation” (attorney docket no. 087805-9068 US00), which claims priorityto GB Patent Application No. 1206065.3 filed on Apr. 4, 2012, the entirecontents of both of which are incorporated herein by reference.

FIELD OF INVENTION

This invention relates to video sequence processing particularly inconnection with motion estimation of video signals.

BACKGROUND OF THE INVENTION

In the estimation of motion vectors between video frames, motion vectorsare assigned to pixels, or blocks of pixels, in each frame and describethe estimated displacement of each pixel or block in a next frame or aprevious frame in the sequence of frames. In the following description,the motion estimation is considered to be “dense” meaning that a motionvector is calculated for every pixel. The definition of “dense” may bewidened to cover the calculation of a motion vector for each small blockin the picture, for each pixel in a subsampled version of the picture,or for each small region of arbitrary shape within which the motion isexpected to be uniform. The invention can be applied with trivialmodification to these wider cases.

Motion estimation has application in many image and video processingtasks, including video compression, motion-compensated temporalinterpolation for standards conversion or slow-motion synthesis,motion-compensated noise reduction, object tracking, image segmentation,and, in the form of displacement estimation, stereoscopic 3D analysisand view synthesis from multiple cameras.

Most applications of motion estimation involve the “projection” (alsodescribed as “shifting”) of picture information forward or backward intime according to the motion vector that has been estimated. This isknown as “motion-compensated” projection. The projection may be to thetime instant of an existing frame or field, for example in compression,where a motion-compensated projection of a past or future frame to thecurrent frame instant serves as a prediction of the current frame.Alternatively, the projection may be to a time instant not in the inputsequence, for example in motion-compensated standards conversion, whereinformation from a current frame is projected to an output time instant,where it will be used to build a motion-compensated interpolated outputframe.

Some of the terminology used in describing motion estimation systemswill now be described. FIG. 1 shows one-dimensional sections through twosuccessive frames in a sequence of video frames. The horizontal axis ofFIG. 1 represents time, and the vertical axis represents position. Ofcourse, the skilled person will recognise that FIG. 1 is asimplification and that motion vectors used in image processing aregenerally two dimensional. The illustrated frames are: a previous orreference frame (101); and, the current frame (102). A motion vector(104) is shown assigned to a pixel (103) in the current frame. Themotion vector indicates a point (105) in the reference frame which isthe estimated source, in the reference frame, of the current frame pixel(103). This example shows a backward vector. Forward vectors may also bemeasured, in which case the reference frame is the next frame in thesequence rather than the previous frame.

The following descriptions assume that these frames are consecutive inthe sequence, but the described processes are equally applicable incases where there are intervening frames, for example in somecompression algorithms. Temporal samples of an image will henceforth bereferred to as fields, as would be the case when processing interlacedimages. However, as the skilled person will appreciate, innon-interlaced image formats a temporal sample is represented by aframe; and, fields may be ‘de-interlaced’ to form frames within an imageprocess. The spatial sampling of the image is not relevant to thediscussion which follows.

An example of an algorithm that calculates motion vectors is disclosedin GB2188510. This algorithm is summarised in FIG. 2 and assigns asingle vector to every pixel of a current field in a sequence of fields.The process of FIG. 2 is assumed to operate sequentially on the pixelsof the current field; the pixel whose vector assignment is currentlybeing determined will be referred to as the current pixel. The currentfield (202) and the previous field (201) are applied to a phasecorrelation unit (203) which calculates a “menu” (204) for every pixelof the current field consisting of a number (three in this example) ofcandidate motion vectors. Each candidate vector controls a respectivemember of a set of shift units (205) which, for every pixel in thecurrent field, displaces the previous field (201) by the respectivecandidate vector to produce a shifted pixel corresponding to the currentpixel of the current field in the respective member of the set ofdisplaced fields (206).

A set of error calculation units (207) produces a set of error values(208), one error value for every menu vector for every pixel of thecurrent field. Each of the error calculation units (207) subtracts therespective one of the displaced fields (206) from the current field(202) and rectifies the result to produce a field of differencemagnitudes, which are known as displaced field differences or “DFDs”.Each of the error calculation units (207) spatially filters itsrespective field of DFDs in a filter centred on the current pixel togive an error value for that pixel and menu vector. This spatiallyfiltered DFD is the error value for the respective current pixel andvector. The set three error values (208) for the current pixel arecompared in a comparison unit (209), which finds the minimum errorvalue. The comparison unit (209) outputs a candidate index (210), whichidentifies the vector that gave rise to the minimum error value. Thecandidate index (210) is then applied to a vector selection unit (211)to select the identified candidate from the menu of vectors (204) as therespective output assigned vector (212) for the current pixel.

An important property of DFDs will now be described. If a candidatemotion vector for a pixel describes the true motion of that pixel, thenwe would expect the DFD to be small, and only non-zero because of noisein the video sequence. If the candidate motion vector is incorrect, thenthe DFD may well be large, but it might be coincidentally small. Forexample, a rising waveform in one field may match a falling waveform inthe displaced field at the point where they cross. Alternatively, apixel may be in a plain area or in a one-dimensional edge, in which caseseveral motion vectors would give rise to a small or even a zero DFDvalue. This inconvenient property of DFDs is sometimes referred to asthe “aperture problem” and leads to the necessity of spatially filteringthe DFDs in order to take information from nearby pixels into account indetermining the error value for a pixel.

In the example of FIG. 2, each error calculation block (207) filters theDFDs with a two-dimensional filter, a typical example of which is a 5×5running-average filter. It is this rectified and filtered error that isused for comparison of candidate motion vectors. FIG. 3 illustrates thepositions of the 25 samples involved in the running-average filter. The5×5 arrangement of 25 samples comprises the samples within therectangular filter window (302) and is centred on the current pixelposition (301).

Choosing the size of the two-dimensional DFD filter involves a trade-offbetween reliability and spatial accuracy of the resulting assignedmotion vector field. If, on the one hand, the filter is large, then theeffect of noise on the filtered error value is reduced and the filter ismore likely to take into account nearby detail in the picture whichmight help to distinguish reliably between candidate motion vectors.However, a large filter is also more likely to take in pixel data fromone or more objects whose motion is properly described by differentmotion vectors, in which case it will fail to give a low error value forany candidate motion vector, even for one that is correct for the pixelin question.

If, on the other hand, the filter is small, it is more likely to involvepixels from only one object and so is more likely to return a low errorvalue for the correct motion vector. However, it will be less likely toreject wrong motion vectors and will be more susceptible to noise.

The inventors have observed that, for critical picture material, thereis no choice of filter size which yields satisfactory performance in allaspects of reliability, noise immunity, spatial accuracy andsensitivity. However, the inventors have recognized that it is possibleto design an improved displaced field difference filter which combinesthe reliability and noise immunity of a large conventional filter withthe sensitivity and spatial accuracy of a small filter, while avoidingthe disadvantages of each.

SUMMARY OF THE INVENTION

The invention consists of a method and apparatus for filtering displacedfield differences arising from candidate motion vectors, characterisedin that the filter window is decomposed into regions that are filteredseparately and whose outputs are combined by a non-linear operation.

BRIEF DESCRIPTION OF THE DRAWINGS

An example of the invention will now be described with reference to thedrawings in which:

FIG. 1 is a diagram showing current and previous frames in an imagesequence and a backward motion vector extending from a pixel in thecurrent frame;

FIG. 2 is a block diagram of apparatus for assigning backward motionvectors to pixels according to the prior art;

FIG. 3 is a diagram of a filter window according to the prior art;

FIG. 4 is a diagram of a set of filter windows according to a firstembodiment of the invention;

FIG. 5 is a block diagram of an improved filter according to a firstembodiment of the invention.

FIG. 6 is a diagram of a set of filter windows according to a secondembodiment of the invention;

FIG. 7 is a block diagram of an improved filter according to a secondembodiment of the invention.

DETAILED DESCRIPTION OF THE INVENTION

As explained in the introduction, a displaced field difference filteroperates on a set of DFDs representing difference values, betweencurrent field pixels and respective displaced field pixels for aparticular motion vector. Typically the difference values are rectifiedprior to filtering so that the magnitudes of the errors are representedby the DFDs. The filter takes contributions from the DFDs for a numberof pixels within a filter window surrounding a current pixel; the DFDfor the current pixel may also be used. Contributions from these DFDsare used to form an error value for the current pixel.

The input DFD values being filtered arise from a candidate motionvector, or from a smoothly varying motion vector field, calculated byknown methods. In the description that follows, the term “motion vector”refers either to a constant vector over a region or to a smoothlyvarying vector field.

Displaced field difference filters according to examples of theinvention will now be described. In each case the filter output is anerror value for a particular motion vector at a particular pixelposition within a current field, this pixel position will be referred toas the current pixel. The filter input DFD values will be referred to assamples, and the DFD corresponding to the current pixel will bedescribed as the current sample. The positions of samples correspondwith the positions of the respective current field pixels used tocalculate the respective DFDs.

The filter window of a first exemplary embodiment of the invention isillustrated in FIG. 4, to which reference is now directed. The filter isgiven access to a number of contributing samples surrounding the currentsample (401). Only samples that are used by the filter are shown in FIG.4; other samples in the vicinity of the current sample are not shown,typically there will be intermediate, unused samples forming part of anorthogonal spatial sampling structure for the current field. Thecontributing samples are grouped into eight line segments (402 to 409)in a star pattern centred on the current sample (401). The choice ofthis pattern is a compromise between economy and ease of access tosamples in a hardware implementation, and the need to cover a reasonablywide area surrounding the current sample. In this particular example,each line segment contains seven samples, though other sizes arepossible without departing from the scope of the invention.

The object of the filter is to give a high output if the motion vectorthat gave rise to the contributing samples is the wrong motion vectorfor the position of the current sample (401), and to give a low outputif the motion vector is correct. If we begin with the assumption thatthe validity or invalidity of a motion vector extends across the areacovered by the star pattern, then a high sample value somewhere in thepattern constitutes evidence that the motion vector is incorrect, and asuitable nonlinear filtering operation would be to take the maximum ofthe sample values across the pattern. However, it is quite possible thata boundary between two differently moving objects, for example the lineshown (410) will cross the area. In this case, if the motion vector thatgave rise to the sample is the one describing the motion of theright-hand object, we would expect the samples to the right of the lineto have low values and those to the left to have at least some highvalues. We observe that, if the eight line segments in the star patternare grouped into pairs of diametrically opposite segments (402 with 403;404 with 405; 406 with 407; and, 408 with 409) then one segment of eachpair will be expected to contain low sample values. The operation of thefirst inventive filter is therefore to take maximum values in each linesegment, and then to take the minimum of the two maxima within eachpair. This operation produces four values, all of which we expect to below if the motion vector is correct. A further operation of the filteris therefore to take the maximum of the four minima. Finally, it isimportant for spatial accuracy to take account of the current sample.This is done by combining its value with the output of the filter so fardefined, for example by taking the mean square value.

An alternative description of the first exemplary inventive filter willnow be given with reference to the block diagram in FIG. 5. The filterreceives an input stream of samples (500) corresponding to the DFDs fora current field and a particular motion vector. The samples are orderedaccording to a scanning raster so that when they are passed through achain of delay elements (510) suitable choices for the delay values giveaccess to the 57 (in this example) samples at the locations shown in thestar pattern of FIG. 4. The output of the delay chain (510) takes theform of eight sets (502 to 509) of seven samples each, where output(502) corresponds to line segment (402), output (503) to line segment(403), and so on, together with the central sample (501), correspondingto current sample (401)). The maximum value of each of the eight sets isfound in respective maximum-value calculation units (512) to (519). Theresulting maximum values (522) to (529) are applied in pairs tominimum-value calculation units (532), (534), (536) and (538) so as tofind the respective minimum values from diametrically-opposite filterwindow segments. The resulting minimum values (542), (544), (546) and(548) are applied to a maximum-value calculation unit (550) whose output(551) is combined (553) with the current sample (501) by taking theroot-mean-square value, which form the filtered DFD output (554).

Possible variations of this filter will now be described. In a firstvariation, the eight maximum-value calculation units (512) to (519) arereplaced by eight averaging units. This variation can improve the noiseimmunity of the filter. In a second variation, the subsequentmaximum-value unit (550) is likewise replaced by an averaging unit.

It will be apparent to the skilled person that other choices ofprocessing elements may also be used. For example, units (512) to (519)may calculate: a mean square value; a combination of the mean and themaximum; or, other rank-order values such as the second or third highestvalue. Similarly, unit (550) may also take: a mean square value; acombination of the mean and the maximum; or, the second highest value.Such decisions are a trade-off between robustness to noise andsensitivity to data, and between reliability and the capability ofhandling motion vector boundaries that are more complex in shape.

A displaced field difference filter according to a second exemplaryembodiment of the invention will now be described. The second filter ismore reliable than those previously described, at the cost of anincrease in complexity. FIG. 6 shows the samples involved in the secondfilter, based on an example window size of 15×15. In place of the eight7-sample line segments shown in FIG. 4, this filter has eight octants(602) to (609) each containing 28 samples. (In FIG. 6 the samplepositions in alternate octants are indicated by open circles so as toindicate more clearly the allocation of samples to octants.) The averagevalue of the samples within each octant is taken, and subsequentprocessing may be the same as that of the first filter.

Preferably however, the final combining step, (553) of FIG. 5, may bereplaced by a linear combination of the output of the four-value mean(550 in FIG. 5) with the output of a conventional 5×5 running averagefilter whose window (610) is also shown in FIG. 6.

The architecture of the second filter may be based on FIG. 5, with theoutput of delay chain (510) now consisting of eight sets of 28 samples.However, a more efficient implementation is as shown in FIG. 7, wherethe chain of delay elements and the mean-value calculations at itsoutput are replaced by octant-shaped running-average filters which maybe constructed, for example, as described in UK patent application1113569.6, with additional simplifications that exploit the fact thatthe octants have shared boundaries.

Referring to FIG. 7, the input stream of samples (700) is applied toeight octant-shaped running-average filters (712) to (719) whose outputs(722) to (729) are applied in pairs to minimum-value calculation units(732), (734), (736) and (738) so as to find the respective minimumvalues from diametrically-opposite filter window segments. The resultingminimum values (742), (744), (746) and (748) are applied to an averagingunit (750) whose output (751) is linearly combined (753) with the output(752) of a 5×5 running-average filter (702) applied to a suitablydelayed version (701) of the input (700), to produce a final filteredDFD output (754). A typical linear combination in block (753) is to add75% of the output (751) of the averaging unit (750) to 25% of the output(752) of the 5×5 running-average filter (702).

The invention so far described involves filter windows of particularsizes and shapes. It will be apparent to the skilled person that othersizes and shapes may be chosen without departing from the scope of theinvention. For example, the line segments of the star pattern in FIG. 4may contain fewer or more than the seven samples shown. The pattern mayalso have fewer or more than the eight line segments shown. Likewise,the square window shown in FIG. 6 may be smaller or larger than the15×15 window shown, and the eight octants may be replaced by suitablenumbers of other shapes, for example four quadrants or sixteensedecants. The window need not be square: for example, windows that arepolygonal with other than four sides, or that are approximatelycircular, may also be used. It is also possible to combine error valuesamples from overlapping segments of the filter window without departingfrom the scope of the invention.

The above description is based on displaced field differences. Othermeasures of pixel-to-pixel dissimilarity may also be used, including butnot limited to: nonlinear functions of displaced field difference,displaced field differences between noise-reduced fields, Euclidean orother distances between multidimensional signals, for example RGBsignals, and differences between feature point descriptors.

The implementations of the filters have been described in terms ofserial processing of streams of values, typically ordered according to ascanning raster. Of course the skilled person will appreciate that manyother implementations of the inventive filters are possible, including,for example, the use of random-access field or frame stores orprogrammable apparatus. And, as explained in the introduction, filteringaccording to the invention may be applied to measures of dissimilaritybetween subsamples or regions of an image.

Although motion-compensated processing of images is typically applied toa time sequence of images where the sequence of images is a timesequence, the same process may be used with spatial image sequences,where the sequence is a sequence of different views of a common scene,or a sequence of different views captured in a time sequence. Thecurrent invention is equally applicable to the processing of these othertypes of image sequence. The invention may also be applied where thepixel to pixel dissimilarity values are derived not from motion or othercomparison of different images but by comparing different regions of thesame image, for example to test a prediction. Motion compensation mayitself be regarded as a form of prediction so the term predictor may beused here to include a motion vector; a displacement from one region ofan image to another image; as well as other forms of predictor.

1. In a video processor, a method of analysing an array ofpixel-to-pixel dissimilarity values to select a pixel which has a lowpixel-to-pixel dissimilarity value and which has neighbouring pixelswhich have a low pixel-to-pixel dissimilarity value; the methodcomprising the steps of: receiving a spatial array of pixel-to-pixeldissimilarity values, each pixel-to-pixel dissimilarity valuerepresenting a difference between the value of a pixel in a firstspatial pixel array representing a first image and the value of acorresponding pixel in a second spatial pixel array representing asecond image; dividing the pixels neighbouring the current pixel into atleast two polar sectors; and selecting the current pixel when thecurrent pixel and the pixels of at least one polar sector have lowdissimilarity values.
 2. The method according to claim 1 in which eachdissimilarity value is a rectified field difference between fieldsdisplaced through motion compensation.
 3. The method according to claim1 in which the polar sectors are non-overlapping.
 4. A video sequenceprocessing apparatus for analysing motion-compensated pixel-to-pixeldissimilarity values, the apparatus comprising a spatial filterconfigured to operate on motion-compensated pixel-to-pixel dissimilarityvalues in which the filter aperture which defines those values on whichthe filter operates is decomposed into two or more polar sectors and thefilter comprises a plurality of partial filters applied respectively toeach sector; and a combiner for combining the respective outputs of saidpartial filters by a non-linear operation.
 5. The apparatus according toclaim 4 in which the combiner operates to taking minimum values ofpartial-filter outputs from pairs of sectors that are diametricallyopposite each other in the filter aperture.
 6. The apparatus accordingto claim 4 in which the partial filtering operation is a rank-orderoperation.
 7. The apparatus according to claim 4 in which the partialfiltering operation is an averaging operation.
 8. The apparatusaccording to claim 4 in which the minimum values from pairs of sectorsare processed by a rank-order operation.
 9. The apparatus according toclaim 4 in which the minimum values from pairs of sectors are processedby an averaging operation.
 10. A non-transitory computer readable mediumcomprising computer-readable instructions that when executed by aprogrammable apparatus implement a method of analysing an array ofpixel-to-pixel dissimilarity values to select a pixel which has a lowpixel-to-pixel dissimilarity value and which has neighbouring pixelswhich have a low pixel-to-pixel dissimilarity value; the methodcomprising the steps of: receiving a spatial array of pixel-to-pixeldissimilarity values, each pixel-to-pixel dissimilarity valuerepresenting a difference between the value of a pixel in a firstspatial pixel array representing a first image and the value of acorresponding pixel in a second spatial pixel array representing asecond image; dividing the pixels neighbouring the current pixel into atleast two polar sectors; and selecting the current pixel when thecurrent pixel and the pixels of at least one polar sector have lowdissimilarity values.